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Abstract. Search engines are as important as recommender systems for hotel selections. However, the recommended lists 
of search engines are usually non-personalized and low accuracy. In order to deal with these issues in search engines, 
a comprehensive mechanism for hotel recommendation is proposed. In. this mechanism, we consider users’ personalized 
preferences by identifying users’ attributes about interest, trust and consumption capacity. Meanwhile, the quantification 
method for each attribute is presented by using fuzzy theory. Moreover, this paper improves the method to evaluate the hotel, 
which respects to the criteria price, rating, and online review by using fuzzy theory. In addition, this proposed approach uses 
TOPSIS, a classical multi-criteria decision making method, to improve the accuracy further. Finally, a case study is conducted 
based on Tripadvisor.com to illustrate the validity of the proposed method for hotel recommendation in search engines. The 
results of the case study indicate that it not only solves the problem of non-personalization, but also improves the accuracy 


in search engine. 
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1. Introduction 


The development of electronic-commerce (e- 
commerce) and social media has changed users’ 
styles for booking hotels. Tourists are accustomed 
to booking hotels in advance on e-commerce plat- 
forms, such as HolidayInn.com, ctrip.com and so on. 
However, the large quantity, varied prices and ragged 
quality of hotels make it is difficult for users to find 
satisfactory hotels. Recommender systems can filter 
the vast amount of information in networks to assist 
consumer in making the best choices [1-6]. Besides 
the recommender systems, search engines are the 
other common tools to filter information and recom- 
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mend items on e-commerce platforms for consumers. 
Both recommender systems and search engines can 
augment the possibility of online purchasing. 

At present, most of the researches focus on rec- 
ommender systems, few of them study how to 
improve search engines. However, there are some 
obvious issues in search engines. Usually, search 
engines cannot provide personalized recommended 
lists. Nevertheless, the item recommendations by 
search engines are mainly based on the ratings. 
Besides, even if consumers are not satisfied with the 
items, they still tend to give high scores due to the con- 
sumers’ rating habits or in order to avoid malicious 
harassment caused by giving low scores. It causes 
most the ratings of items are concentrated in 4 or 5 (on 
the 1-5 scale). Traditional recommendation meth- 
ods usually use the average ratings to rank hotels, 
and the difference of average ratings among hotels 
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is often slight. In the circumstances, it is difficult to 
distinguish the quality of hotels [7] on the basis of 
average ratings, which in turn causes low accuracy 
of the recommendations for search engines. In this 
study, we will propose a new method to deal with 
the ratings and a comprehensive hotel recommenda- 
tion method to improve the performance of search 
engines. 

The basic framework of consumer’s purchasing 
decision-making in the context of online shopping 
is same with that of off-line way. The decision- 
making process of items selection is the most critical 
stage in the process of consumer’s online shopping 
[8]. At this stage, consumers need to compare, ana- 
lyze and evaluate the selected items according to 
the purchase criteria, so that they can make a judg- 
ment of the quality of the items, to generate the 
final evaluation. The researches of the current rec- 
ommendation methods often focus on the demand 
evoking part, which ignore the process of consumer 
decision. With respect to search engines, consumers 
provide their demands directly by entering keywords. 
Therefore, there is a relatively low requirement on 
identifying the demands of consumers. In addition, 
decision-making process of items selection is more 
important in the methods for search engines than 
in the traditional recommendation methods. In this 
study, we will ameliorate search engines by research- 
ing decision-making process of items selection. 

The decision-making process of items selection is 
influenced by many factors for consumers [9-11]: 
The rating toward item is the most used criterion for 
item selection [12, 13]. However, most studies use 
only a comprehensive score, which results inva seri- 
ous loss of information [14, 15]. In recent years, with 
the development of text analysis technology, several 
researches have combined ratings and online reviews 
to select items [16, 17]. 

As online reviews are in the form of text, which 
contain more information than ratings, more and 
more scholars have studied the impact of online 
reviews and applied them to improve the performance 
of recommender systems [18-21]. Most of them 
utilize online reviews to identify users’ or items’ pref- 
erences by identifying keywords in online reviews, 
then use these keywords as the properties of items 
or users [22, 23]. Some other researches use online 
reviews to evaluate items [24]. Zhang et al. [25] pro- 
posed a methods to quantify online reviews by using 
neutrosophic theory. In their study, they did not trans- 
late online reviews into the neutrosophic numbers 
actually, but translated ratings into interval-valued 


neutrosophic numbers instead. In this study, on the 
basis of research in [25], we will use the sentiment 
analysis technology to transform online reviews into 
single valued neutrosophic numbers (SVNNs). 

Apart from ratings and online reviews, it has been 
proved that price is a pivotal factor that influences 
consumers’ decision-making [26-30]. Few studies 
have applied the price in product or hotel recom- 
mendation. Largely because of regional economy and 
personal incomes, different consumers may have dif- 
ferent opinions on the question whether the price of 
a hotel is expensive or not. The result is that it can- 
not directly rely on the value of the price to evaluate 
or recommend hotel. Besides, due to the impact of 
various factors, such as social position, vanity and 
income, different consumers have different prefer- 
ences for products’ prices. One point to be sure is 
that, for the same consumer, his consumption capac- 
ity will.be consistent for a long time. The higher the 
similarity between the hotel’s price and customer’s 
consumption capacity, the greater the likelihood for 
the hotel to be booked by the customer. 

Based on the researches outlined above, in the 
decision-making process of item selection, we will 
combine the impact of rating, online reviews, as well 
as price to improve the accuracy of search engine. 

In the hotel recommendation field, some studies 
are devoted to the excavation of hotel evaluation 
rules. Yu and Chang [31] designed a hotel evalua- 
tion rule with five criteria including distance, traveler 
preference, room rate, facilities, and rating these five 
criteria. Levi et al. [32] mined hotel reviews to deter- 
mine the importance of each criterion of hotels. Some 
literatures designed personalized hotel recommenda- 
tion methods by giving different weights for different 
criteria based on the target user’s personalized prefer- 
ence [33, 34]. Nevertheless, most existing researches 
give the same weights for all users who have comment 
on the hotels. Actually, for a target user, the ratings 
and online reviews of similar users have more influ- 
ence than those of dissimilar users. Therefore, in this 
study, we need to mine users’ preferences, to identify 
similar group, to adjust the weights of groups with 
different similarities to target user in terms of ratings 
and online reviews, and to deal with the problem of 
non-personalization in search engine. 

Fuzzy tools have been used to model uncertain and 
vague preferences in recommender systems [34, 35]. 
Usually, the fuzzy sets used in different studies are 
different because of the different data forms and rec- 
ommendation strategies. Combining the features of 
the collected data and the feature of each fuzzy set, 
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this paper will use the appropriate fuzzy set to quan- 
tify prices, ratings, and online reviews. The form of 
information about each criterion to evaluate hotels is 
different, so the method of quantification varies con- 
siderably with criteria. For this reason, aggregating 
the criterion evaluation values directly to sort hotels 
is not efficient. We could rank hotels and recommend 
them by using the TOPSIS (Technique for Order Pref- 
erence by Similarity to Solution) method, to solve the 
multi-criteria decision making problems [36-39]. 
The remainder of this paper is organized as follows. 
In Section 2, we briefly review some basic concepts 
related to this study. Section 3 proposes a comprehen- 


1) A+ B={(x, wa (x) + WB) — [Ha (x) WB (X), 
va (x) vB (x)) |x € X}, 
2) A» B={(x, va(x)+ vp(x)— va(a)ug(a), Ha(X) 
p(x) ) |x € X}. 
3) A-A={(x, 1-G—p))*, 1-d- va, 
(x))*) xe X}, 
4) A* = { (x, wa” (x), va* (x) |x € X)}. 


Definition 3. [41] Let A={(x, wa(x), v4 
(x))|xeX}, B= {(x, up(x), ve x)) |x € X} 
are two IFSs on X, then the normalized Euclidean 
distance between.A and B can be defined as: 


n 


1 
dips(A, B)= ,| — > (Ha) — wa @)Y? + Vara VEO)? + (tA) —AB~Y) A) 


2n + 
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sive mechanism for hotel recommendation. To verify 
the feasibility of the method, Section 4 conducts a 
case study using data from TripAdvisor.com. Finally, 
Section 5 concludes the study and suggests directions 
for future research. 


2. Basic concepts 


In this section, we briefly review the defini- 
tions of the intuitionistic fuzzy set (IFS), the single 
valued neutrosophic set (SVNS), and the hesitant 
fuzzy linguistic term set (HFLS), as well_as their 
operations and distance methods. These definitions 
will be used in the proposed hotel recommendation 
method. 


Definition 1. [40] Let X be a nonempty clas- 
sical set, X = {x1, x2,..., X,}, the intuitionistic 
fuzzy set defined on X is represented as: A = 
{(x, Wa (x), va (a) |x € X}, where 4 (x): X > 
[0, 1] and vg (x): X — [0, 1] are the membership 
degree and non-membership degree of A, and 0 < 
HA (x) + vg (x) < 1. 

For every IFS on X, m4 (x) = 1 — Wa (Xx) — V4 (X) 
is the intuitionistic index of the element x in A, and 
O<ma(x) <1, xe X. 


Definition 2. [40] Let A= {(x, wa(x), vA 
(x)) |x € X},and B= {(x, wp (x), vp (x)) |x € X} 
be two IFSs on X, the operations of A and B are 
defined as: 


Definition 4. [42] Let X be a universe of discourse, 
then single valued neutrosophic set a in X is defined 
as: a = {(x, Ta (x), Ia (x), Fa(x))|x € X}. Tal), 
T(x), Fa(x) are the degrees of truth-membership, 
indeterminacy-membership and falsity-membership, 
respectively. The values of T,(x), Ig(x) and F(x) lie 
in [0,1], for all x € X. The element in set a is called 
single valued neutrosophic number (SVNN). 


Definition 5. [43] Let A and B be two SVNNs, for 
any x € X, there are some operations, and defined as: 


1) A+ B= (T4+ Tz —TaTp, Tat Ip - 
Ialp, Fa + Fp —FaAFp), 

2) A- B=(TaTp, Ialp, FaFp), 

3) 4-A=(1-(1— Ta), 1- CA -Iay*, 1- 
(1 — F,)*), and 

Ay AM = (Tas Eas Fa? ys 


Definition 6. [44] Let A and B be two SVNNs, then 
the Euclidean distance between A and B is defined 
as: 


dy (A, B) 


1 
= i 3 ((Ta—Te)+Ua—I)+(Fa— Fe)’). 
(2) 
Definition 7. [45] introduced a subscript-symmetric 


linguistic evaluation scale, which can be defined as 
follows: S = {s_,,...,5-1, S0, S1,.--Sr}. 
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Definition 8. [45] Let sz and s, be two linguistic 
terms, then the distance measure between sq and sz, 
is: 

la — D| 


d(Sa, Sb) = ead (3) 


Definition 9. [46] S = {s_z, ..., 5-1, $0, SI, .-. 
Sr} is a linguistic scale set including 2t + 1 linguis- 
tic terms, then a set of finite ordered elements in 
the set S is called a hesitant fuzzy linguistic term 
set (HFLS), denoted as Hs: Hs = {sgilsgi €¢ S, i= 
1, 2,..., nm}, Sg; 1s a linguistic term, an element in 
Hs. Pi is the subscript of sg;, and —t < fi<t.n 
represents the number of elements in Hs. 


Definition 10. [47] Let H} and H? be two HFLSs, 
the Hamming distance between H} and H? is defined 
as: 


a(m, w) — 15> ua Bl 4 
(4. aire? +1’ (4) 


when the numbers of elements in two sets are differ- 
ent, complement the set with fewer elements. 


3. A comprehensive mechanism for hotel 
recommendation 


In this section, we propose a comprehensive-hotel 
recommendation method to improve the performance 
of search engine. The framework of the proposed 
method is shown as Fig. 1. 

The details of the proposed method will be expati- 
ated in the rest of this section. 


3.1. The quantification of user attribute 


In a socialized business environment, users’ data 
have increased exponentially. User’s purchasing 
records, browsing records, ratings, online reviews, 
tags and other data, to some extent, reflect user’s 
interest, trust, consumption capacity and other 
personalized information. How to utilize massive 
unstructured information, to mine user’s attribute 
information accurately and effectively is one of the 
key issues in this study. In this part, we will intro- 
duce the quantification methods for user’s attributes 
in detail. 


User interest. User interest directly reflects con- 
sumer’s purchase preference. Mining users’ interest 


> keywords 
> user information 
> hotel information 


| > search the hotel using search engine 
> The hotels recommended as alternative hotels 


> Quantifying user attributes 
> Defining user similarity group 


Quantification 


of hotel > price, rating, 
evaluation online review 
criteria 


Fig. 1. The framework of the proposed method. 


is one of the indispensable steps to provide users with 
accurate recommendations. This study will extract 
user interest attribute from user’s purchasing records 
and.online reviews. 

The steps of interest recognition are as follows. 


(i) For user u, gather all titles of hotels user u 
stayed and user’s online reviews to form a long 
text, denoted as T,,. 

(ii) For each long text, eliminate irrelevant words 
such as stop words and so on, and carry out 
word segmentation and word frequency statis- 
tics. Then, classify the high-frequency words. 
Because the same interest can be expressed 
in different ways, it is necessary to sort out 
the high-frequency words further: classify syn- 
onyms into one category, and use each category 
vocabulary as a feature of user w’s interest, the 
feature of user’s interest denoting as fy. 

(iii) For each T,, count the frequency of each 
feature f,, appearing in the text 7,, and 
denote as freq,. Thus, the interest attribute 
for user u can be represented by the collec- 
tion of all the features, represented as F(u) = 
{fut, fu2,---fui---» fun}, where f,; is the 
ith feature of interest for user u, freq(u) = 


{ frequi, frequ2,... frequi..., frequn} is a 
set of the frequency corresponding to the fea- 
ture F(u). 


User trust. At present, the number of users on e- 
commerce websites is huge. Some users are false 
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golfer326 


© Send Message 


Since Nov 2014 Jordan 


50-64 year old 
female 


the data reflects how many items 
user “golfer 326” has commented. 


9 Ratings the data reflects how many users 
2 Helpful votes think the reviews are useful 
Fig. 2. The mechanism to evaluate user’s trust on Tripadvisor.com. 


users, whose purchasing and comment behaviors are 
irregular and untruthfulness. The online reviews and 
ratings of these users are unconvincing. Their trust 
degrees are low, and the recommendations based on 
the historic records of them are inaccurate. The lower 
the trust of user, the lower the reliability of the user, 
and the lower the accuracy of the recommendation 
based on his historic records. Therefore, the trust of 
user is also important to the identification of similar 
group, and the users in similar group are with a high 
degree of trust. Many e-commerce platforms have 
the mechanisms to evaluate user’s trust, for exam- 
ple, Fig. 2 shows the mechanism to evaluate user’s 
trust on Tripadvisor.com. The more the number of 
“helpful votes” a single review gets, the more the 
trustworthy the user possesses. By comparison, some 
e-commerce platforms do not have the mechanisms 
to evaluate user’s trust. 

For the former, there is the standardization of user’s 
trust: 

ty 5 
trusty = nae (5) 

where ¢,, is the trust of user u on the e-commerce plat- 
form, max(t) is the maximum value in all users’ trust. 
For the latter, the trust quantitative method needs to 
be further studied in the future. 


User consumption capacity. Similar users have 
similar consumption capacities. User’s consumption 
capacity can be reflected by the prices of hotels user 
has stayed. In this study, we will utilize linguistic term 
sets to express users’ consumption capacities. 
Firstly, assume that there is a set L of nine 
linguistic terms S to depict the price, where L = 
{s_4, 5-3, S_2, S_1, SQ, S1, 52, 53, S4}, | Where 


3 miqutes waar welcoming 5 dover ; 
: neweyes | me manhattan § 


i. wee Gt —e Sia aet| 
wisar & arrivedhotels 
 Seestound desk 20, wonderful experience 


_oiitp staying Can secity helpful « excellentin tay sr 
forme place=friendly site = = = 
bh ¢2xinoton ee 


‘ = ine new TS ne 
Snights Chelsea stays iseivicewr = 


whings area 
yea PASS re at . t a ‘gle t= aN seer | 
| jus i lovely "= 
ze Loe ee ‘ ree’ ave) “mee, 
3 ke thay 28 feleaties 


time 2 Ei | 
PAE ted red it 
stayed 38 om 
7, ior york] nice® Cy el 
sie etiest see” ONE~ os mr 
[see al $2 will " 


made eed . 
=ilocation: pZIroOMs 32 8e- 


Sree SU ry teapdeeetess wine = | ines “mall NSge ee 
agbed 1b real kfa st pit muse restaurant 
roumyir perfect Inn” daycomfortable =. 


seeqed return mm “Sofitel bit veto sea right cays" = p 


Fig.3. Keywords represented user’s interests. 


S4= eee one , s_3= “Significantly 
Cheap”, s_2=“Cheap’”, 1 =“Somewhat Cheap”, 
so = “Medially”, ST= eae Expensive”, 
Ss = “Expensive”, s3 = “Significantly Expensive” and 
s4 = “Certainly Expensive”. 

Secondly, collect the price distributions of differ- 
ent cities, and divide the prices of hotels in each 
city into 9 levels corresponding to linguistic terms 
S. For the same city, the higher the price of hotel 
user booked, the stronger the consumption capacity 
of user. Finally, for each user, consumption capacity 
is denoted as H,, in the form of linguistic term sets. 


3.2. The definition of similar group 


In this study, we identify consumers with high sim- 
ilarity in terms of interest, trust, and consumption 
capacity as similar group, and they tend to have a 
similar purchasing preference. 

(1) Generally, the more similar the characteristics 
of hotels and the aspects when commenting on hotels, 
the higher the preference similarity between users. 
The interest preference similarity between target user 
u and the user v can be computed as follow: 


Sim_int(u, v) 


So C-| frequj—frequj|) 


= ( | aay Nu # O and Ny #0, 


0, Ny, = O0or Ny = 0 
(6) 
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where N,,, Ny are the numbers of interest features for 
user u and user v; j denotes the jth common interest 
feature for user u and user v; m is the total number of 
common interest features for user u and user v, freqy; 
and freqy; are the frequencies of the jth common 
interest feature in user u and user v interest features, 
respectively. 

(2) The greater the value of user’s trust, the more 
the target user trusts another user, which means the 
higher the trust similarity degree between target user 
and the other user. In addition, the greater the trust 
value of the target user himself, the higher the require- 
ment in trust for the users in the similar group. The 
method of calculating the trust similarity is shown as: 


Sim_trust(u, v) 


(1+ min(trust,, trusty)) x trusty 


1 + trust(u) , @ 


where trust,, and trust, denote the trust degrees of 
user u and user v, min(trust,, trusty) is the minimum 
value in trusty, truSty. 

(3) For user u and user v, their consumption capac- 
ities are denoted as H,, and Hy, then the consumption 
capacity similarity between them is defined as 


sim_c(H,, Hy) = 1—d(Ay, Hy). (8) 


Then, for user u and user v, the comprehensive 
similarity is calculated as follow: 
Sim _all (u v) 
_ Sim_int(u, v)+ Sim_trust (u, v)+Sim_c(H,, Hy) 
= 5 . 


(9) 


According to the comprehensive similarity of 
users, we divide the users into three groups: 


e Similar group (G1): the user whose user simi- 
larity is higher than 6; is a member of similar 
group. 

e Weak similar group (G2): the user whose user 
similarity is less than 6) and higher than 62 is a 
member of weak similar group. 

e Dissimilar group (G3): the user whose user sim- 
ilarity is less than 02 is a member of dissimilar 
group. 

The equation to identify each group is as follows: 


G, = {v|sim_all(u, v) > 61}, (10) 


Go = {v|@ <simall(u, v) <4}, (ND 


G3 = {v|sim_all(u, v) < 62}. (12) 


In this study, for G1, G2, G3, the weight of each 
group is defined as follow: 


sim_average(u, Gi) 

3 ’ 
>> sim_average(u, Gj) 
l=1 


(13) 


w= 


where user u is the target user, sim_average(u, Gj) 
is the average of the similarities between members in 
group G; and the target user. 


3.3. The quantification of hotel evaluation 
criteria 


In order tonimprove the accuracy of recommen- 
dations, this study comprehensively considers three 
factors, which are rating, online reviews and price as 
hotel evaluation criteria. 

Rating. The rating reflects the satisfaction degree 
of a user towards a hotel. For a hotel, some users 
rate it with high value, while others give low ratings. 
Simply seeking the average rating of all users just 
reflects little information. In order to retain the rating 
preferences of different users for the hotel as much 
as possible, for each group, we use an intuitive fuzzy 
number to express the satisfaction of group towards 
hotel. 


IF (Gi) = ((u(Gi), v(Gi))} - (14) 


In most of the electronic commerce platforms, the 
rating scale ranges from | to 5. A score of 4 or 5 
represents that the user likes the hotel, and a score 
of 1 or 2 represents that the user does not like the 
hotel. Therefore, when the scale is in the range of 
1 to 5, the ratios of ratings above 3 and less than 3 
in the group indicate that the degree of membership 
and the degree of non-membership towards hotel. G 
represents similar groups. G2 means weak similar 
group and G3 means dissimilar group. 

Compared to users with low similarity, users with 
high similarity have a greater impact on the purchas- 
ing decision of target user. The ratings and online 
reviews of similar group are more important than 
those of weak similar group or dissimilar group are. 
For each group (G1, G2, G3), we can calculate their 
weights based on Equation (13). By giving different 
groups different weights, it not only makes the result 
more accurate than directly calculating the average of 
all users’ ratings and online reviews, but also makes 
the recommendation result personalized. 
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The comprehensive rating for all users towards the 
hotel is JF(G): 


3 
IF(G) = 5 wiIF (Gi). (15) 
i=l 


Online reviews. Because one online review may 
have positive, neutral and negative evaluations at the 
same time. In order to depict online reviews appro- 
priately and reduce the loss of information, we use 
single valued neutrosophic numbers to express online 
reviews. According to the definition of SVNNs, the 
online review of user u on the hotel i is expressed 
as Ryuji = (Tui, Ini, Fui). T indicates the positive 
degree of the positive online review, and / indicates 
the neutral degree of the neutral online review and F 
indicates the negative degree of the negative online 
review. 

Further, we regard all users in the same group as 
one virtual user. Then the online reviews of users in 
the same group are regarded as one online review. 
The online review for group G; is denoted as: 


R(Gi) = (T (Gi), (Gi), F(Gi)). (16) 


Then all users’ online reviews are calculated as: 


3 
R(G) = So wiRG). (17) 
i=l 


Price. The quantitation method of price is similar 
to that of consumption capacity in Section, 3.1. 
First, define the set of linguistic terms that represent 
the price. Assume that there is a set S of, nine 
linguistic terms to depict the price, where S$ = 
{s_4, 5-3, S-2, S-1, 80, S1, 52, 83, Sa}, S_4= 
“Incomparable Cheap”, s_3 = “Significantly 
Cheap”, s_2=“Cheap”, s_; =“Somewhat Cheap”, 
so=“Medially ”, s,;=“Somewhat Expensive”, 
so =“Expensive”’, s3=“Significantly Expensive” 
and s4=“Certainly Expensive”. Then, collect the 
price distribution of hotels in each city, and divide 
the price into 9 levels corresponding to the price 
linguistic term of each city. The price of each hotel 
can be expressed as a linguistic term, and denoted as 
Sg. Let the evaluation of price for hotel i be denoted 
as P;: 


| bea 
l 9 ’ 


where sq is a linguistic term, @ is the average 
value of subscripts in the linguistic term set which 


(18) 


is used to express the target user’s consumption 
capacity. 


3.4. The model of recommendation based on 


TOPSIS 


Because the expressions of three criteria to evaluate 
hotels are different, criterion values cannot be aggre- 
gated directly by aggregation operators. The TOPSIS 
method aggregates all criteria based on distance mea- 
sure, and sorts the alternatives according to the degree 
of closeness. There is no special requirement for the 
data form of.each criterion value. In this part, we 
will sort hotels by TOPSIS method and form final 
recommended list. The following are the steps: 

(1) Obtain the positive ideal solution and the neg- 
ative ideal solution. 

The positive ideal solution consists of the posi- 
tive ideal value of each criterion, and the negative 
ideal solution consists of the negative ideal value of 
each criterion. For the criterion of rating, the pos- 
itive ideal value is (1,0), which means all users 
have high ratings on hotel (4 or 5, in a scale of 
1-5). For the criterion of online review, the posi- 
tive ideal value is (1, 0,0), which means all online 
reviews contains only positive comments, in which 
the positive-membership degree of the online review 
is equal to 1. For the criterion of price, the positive 
ideal value is 1, which means the similarity between 
the price of hotel and the consumption capacity of 
user is 1. The negative ideal value is completely oppo- 
site to the positive ideal value under each criterion. 

Then, the positive ideal solution A* and the nega- 
tive ideal solution A~ are descripted as follows: 


At = {(1, 0), (1, 0, 0}.1}, (19) 


A’ = {(0, 1), (0, 1, 1), O}. (20) 


(2) Calculate the distance. For each alternative 
hotel, calculate the distance between each alternative 
and the positive ideal solution Dr: as well as the dis- 
tances between each alternative and the negative ideal 
solution D; . For the criteria rating, online reviews 
and price, the distances can be calculated based on 
Equations (1, 2 and 4). 

(3) Calculate the Close Index C;j. 


D;~ 


———————— 
‘Dit + Di- 


(21) 


Rank hotels based on Close Index. The greater 
the value of hotel’s Close Index, the higher the 
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recommended ranking of hotel. Then recommend the 
hotels to the target user u based on the new ranking 
list. 


3.5. The process of recommendation 


In this part, we will introduce the process of 
mechanism for hotel recommendation to achieve per- 
sonalized search engine based on the researches we 
had discussed. The proposed approach is comprised 
of the following steps: 


Step 1: Obtain the alternative hotels. Using search 
engine on e-commerce platform to filter the hotels 
based on the keywords entered by the target user. 


Step 2: Identify similar group for the target user. 
According to the user’s historical records and the 
similar group identification method proposed in 3.2. 
For each pair of users, calculate the similarity using 
Equations (6-9), and divide users into three groups. 


Step 3: Calculate the weight of each group according 
to Equation (13). 


Step 4: Calculate hotel evaluation values under three 
criteria: price, rating, online review based on the 
methods proposed in Section 3.3. 


Step 5: Use the TOPSIS method to rank the alterna- 
tives according to Section 3.4. Bigger values of Close 
Index are associated with higher ranking. 


Step 6: Recommend the hotels based on the ranking 
list in step 5 to the target user. 


4. A case study based on Tripadvisor.com 


A case study is conducted in order to validate the 
efficiency and applicability of the proposed method 
on e-commerce website recommender problems. 


4.1. Dataset 


In this section, we use the data collected from Tri- 
padvisor.com, which is a travel-sharing site where 
users can comment on hotels they have lived in, 
attractions they have visited and foods they have 
eaten. In this case study, we assume that the tar- 
get user’s demand is to order a hotel in London by 
searching for the keyword “London hotel”. Then we 
randomly select 10 hotels in the recommended list of 


Table 1 
Alternative hotels for recommendation 


Hotel 


Hotel 41 

Milestone Hotel 
Montcalm Hotel 
Haymarket Hotel 
Four Seasons Hotel 
The Dorchester 
The Capital Hotel 
Ampersand Hotel 
Venture Hostel 
Euro Queens Hotel 


R2 price Pi rating 
1 4274 52 5 
3 6978 S4 
8 2891 S2 
10 14937 S4 
7408 S4 
37 5783 53 
42 3070 Sy 
47 2795 SO 4.5 
690 241 S4 3.5 
922 430 S4 2.5 


nA 
Sewmranawun- |Z 
= 
wn 


Table 2 
A sample of users’ information 
User number vote ty trusty 
brooket986 3 1 0.3333 0.0238 
chickenlips 2 2 1 0.0714 
Zena B 13 12 0.9231 0.0659 
Ronae F 1 0 0 0 


search engine as alternative hotels and collect basic 
information about 10 hotels. For each hotel, we col- 
lect 20 users’ ratings and online reviews. We use these 
data as the test set. Besides, we collect user’s infor- 
mation and historical comment records about hotels 
for each user, which are used as the training set. 
Table | displays the 10 hotels’ information. The 
first column is the name of hotel, the second one is 
hotel’s ranking in 10 hotels, the third column is hotel’s 
ranking in all London’ hotels based on search engine, 
the fourth one is hotel’s price, and the sixth one is 
hotel’s overall rating. Table 2 shows the user’ infor- 
mation, in which the first column is user’s name, the 
second column is the number of hotels that the user 
comments, the third column is the number of other 
users that consider the user’s comments is helpful, 
the fourth and fifth columns are the original and stan- 
dardized trusts of user on Tripadvisor.com. Table 3 
is a sample about the detailed records of users’ com- 
ments on hotels. Since there is too much text in online 
reviews and hotels descriptions, it is not shown here. 
In Table 3, the first column is user’s ID, the second 
column is the hotel the user commented on, the third 
column shows user’s rating to the hotel and the fourth 
and fifth columns show the price and city of hotel. 


4.2. Result 


Firstly, we identify users’ attributes of interest, 
trust and consumption capacity, respectively. The 
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Table 3 Table 4 
A sample about the detail records of users commented on hotels A part of results about the user similarity 
User ID hotel rating price City of hotel User ID S1 S2 S3 S4 
3 Oakley Hall Hotel 4 1393 Hampshire 2 0 0.0714 0 0.02381 
3 The Savoy 5 3579 London 3 0 0.0659 0 0.021978 
3 Flemings Mayfair 5 2100 London 4 0 0.1428 0 0.047619 
5 Kahala Hotel 5 3876 Honolulu > 0 0.0396 0 0.013228 
5 Milestone Hotel 5 3093 London 
198 Disneyland Hotel 5 1585 Hong Kong 
198 Hotel B 3 542 Si des 
sicaekehts 7 meebere The results about similarity groups 
User ID Gl G2 G3 
12000 1 2,3,9,13,14,19 5,8,11,12,16,18  4,6,7,10,15,17,20 
2 1,3,9,13,14,19 5,8,11,12,16,18  4,6,7,10,15,17,20 
10000 3 2,3,5,7,14,20) 6,8,9,12,13,19  4,10,11,15,16,17,18 
8000 Table 6 
The ranking of each hotel for users 
6000 Hotel 
User ID. 
4000 12 3 4 5 67 8 9 10 
1 4 7 1 8 6 53 2 9 10 
2 4 7 1 8 6 53 2 9 10 
2000 3 4 8 2 7 651 3 9 10 
4 4 7 1 8 6 53 2 9 10 
e) 5 15 4 6 8 23 7 9 10 
0 100 200 300 400 28 5 2 7 3 4 16 8 9 10 
43 4 7 15 6 8 5 3 15 9 10 
: iP ck begat 65 4 7 3 8 6 51 2 9 10 
Fig. 4. The hotel’s price distribution. 97 34 72 5 168 9 10 
101 15 8 6 423 7 9 10 
trust attribute has been standardized according to the 121 46 1 7 8 53 2 9 10 
third column in Table 2 and the standardized _trust ais ae a es 
. ; . 161 6 8 1 4 1075 3 2 9 
for user is shown in the fourth column in Table 2: 181 5103 89 76 42 4 


Based on the online reviews and hotels descriptions, 
the keywords used to represent the users’ interests are 
displayed in Fig. 3, the larger the size of a word, the 
higher the frequency. 

The consumption capacity of the user can be 
obtained from the data in the fourth columnin Table 3. 
We compare the hotel’s price distribution in several 
cities, as Fig. 4. It is obvious that the price distri- 
butions in different cities are evidently distinct, the 
linguistic terms corresponding to the same price may 
be different. 

Then for each user, we calculate the similarity with 
other users based on Equations (6-9) proposed in Sec- 
tion 3.3. A part of the results is shown in Table 4. The 
first column is the users’ ID, the second to fourth 
columns are these users’ similarities with user | in 
interests, trust, and consumption capacity. The fifth 
column is the overall similarity. Because user | has 
no historical records, the similarities between him and 
the others in interests and consumption capacity are 
zero. 


Then, for a target user, for each item, the users 
who have evaluated the hotel can be divided into 
three groups. A part of the results is displayed in 
Table 5. Similarly, the first column is user ID, the 
second to the fourth column are the users belongs 
to similar group, weak similar group and dissimilar 
group, respectively. 

Then based on the method proposed in Section 3.4, 
each user can obtain a personalized recommended list 
for hotel. The details are display in Table 6. In Table 6, 
rows 2 to 6 are 10 hotels’ rankings for users | to 5 
who booked the 41 hotel. For each of the remaining 
nine hotels, we select one user who have commented 
on the hotel, and display the 10 hotels’ ranking for 
these users as rows 7 to 15. In search engine, the 
ranking of these 10 hotels is {1, 2, 3, 4, 5, 6, 7, 8, 9 
10} for all users. The result in Table 6 indicates that 
the ranking lists of these 10 hotels for most users are 
different. 
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Precision 


——e—= model 1 = @ = model 2 


Fig. 5. The Precision with the change of n. 


4.3. Discussion 


From Tables 4 to 6, for all users, the proposed 
method has the ability to calculate their similarities 
with other users, to identify their similar groups and 
to provide them a personalized recommended list. 
The feasibility of proposed hotel recommendation 
method has been illustrated. What’s more, the results 
in Table 6 illustrate that the proposed mechanism used 
for search engine can provide personalized list for 
users. 

In the next step, we will discuss the accuracy of 
the proposed method. In this study, we use the index 
of Precision to measure the accuracy. Precision is 
the ratio of the items the user liked to all the rec- 
ommended items in the recommended items [48]. 
Because in the test set, only one hotel per user_has 
been ordered, there is a fine-turning for the definition 
of Precision. In this study, we consider the recom- 
mendation is successful when the ranking of hotel 
user ordered is top-n (where n is the threshold) among 
all hotels. Therefore, the Precision is the,percentage 
of providing successful recommendation to all users. 
The calculation of Precision is shown as: 


N 
Precision = (s) ‘ (22) 
N 


where N(s) is the number of providing successful rec- 
ommendation, and N is the total number of providing 
recommendation. 

We calculate the Precision of the proposed method 
(model 1) and search engine currently used in Tripad- 
visor.com (model 2). Figure 5 exhibits the Precisions 
with the change of n for two methods. 

From Fig. 5, it is obvious that the Precisions of 
the proposed method are always greater than that of 
model 2. That is to say, the accuracy of the proposed 
method is improved comparing to the search engine 
currently used in Tripadvisor.com. 


—®— model 1(a) —— model 2(a) —=*-— model 1(b) 
—<— model 2(b) —*— model 1(c) —®— model 2(c) 
0.8 
0.6 


0.4 


precision 


0.2 


Fig. 6. The precision after eliminating the users who ordered the 
top-a hotels, 


The-hotel’s ranking in search engine will remain 
the same for a long period. For the user we collected, 
the accuracy of model 1 for users who booked top 
hotels is always higher than that for the user who 
booked bottom hotels. For instance, for users who 
bookedhotel 1, the Precision of model 2 is equal to 1, 
while the Precision of model 2 for users who booked 
hotel.,10 is equal to 0. The Precision is extremely 
unstable for users with different preferences on book- 
ing hotels. Therefore, we analyze the Precision for 
different users with different preferences. Figure 6 
shows the Precision after eliminating the users who 
ordered the top-a hotels (where a is a threshold). For 
example, when a= 1, it means that we just calculate 
the Precisions for users who booked hotels 2 to hotel 
10. The results of Model 1(a) and model 2(a) show 
the Precisions for model | and model 2 respectively 
when the value of 1 is 5. The results of Model 1(b) 
and model 2(b) show the Precisions for model 1 and 
model 2 respectively when the value of n is 3. The 
results of model 1(c) and model 2(c) show the Preci- 
sions for model | and model 2 respectively when the 
value of n is 1. Consistent with the results in Fig. 5, 
the Precisions of the proposed method are always 
greater than those of search engine used in Tripadvi- 
sor.com whatever the value of n or a is. In Fig. 6, we 
can observe that the precisions are zero for the users 
who prefer the bottom-5 hotel whatever the value of n 
is. However, the proposed method has stable perfor- 
mance. The result further proves the deficiency of the 
hotel recommended list generated by search engine, 
and indicates that the proposed method can provide 
users accurate and personalized hotel recommended 
list. 

To go a step further, we discuss the accuracy for 
cold start users. The Precisions are presented as (in 
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Table 7 
The precision for cold start users 

Number of model 1 model 2 
historical record 

0 0.39 0.25 

1 0.67 0.43 

2 0.51 0.39 

3 0.51 0.35 


this part, n=5). In the same way, the Precisions of 
model | are greater than those of model 2, and when 
user has one historical record, the Precision of model 
1 is significantly greater than that of model 2. The 
results in Table 7 prove that the accuracy of the pro- 
posed method is improved for cold start users, too. 


5. Conclusion and future works 


This study proposed a comprehensive method for 
hotel recommendation to modify the performance of 
search engine to deal with the problem that search 
engine cannot provide personalized hotel recom- 
mended list. The proposed method considered users’ 
individualized preferences from the aspects of user 
interest, user trust and user consumption capacity and 
divided users into three groups. Besides, compared 
to traditional method, in this paper, we evaluated 
hotel in the criteria price rating and online reviews, 
which can provide a more precise recommendation 
than using a single criterion. For the criteria/of rat- 
ing and online reviews, we gave different weights to 
different groups. For the criterion of price, we consid= 
ered user’s consumption capacity for hotel. Im order 
to ensure the accuracy of the proposed. mechanism, 
we proposed the methods to quantify user attribute 
and hotel evaluation criteria by using fuzzy theory 
to express information more efficient. What’s more, 
we utilized TOPSIS method to solve the problem. A 
case study based on Tripadvisor.com was conducted 
to verify the feasibility and efficiency of the pro- 
posed method. The results of the case study illustrated 
the proposed method can achieve personalized rec- 
ommendations, and improve the accuracy of search 
engine. 

There are also some limitations in this study. For 
instance, in the actual decision-making process, 
there are many factors affect decision of different 
consumers, and here we only consider three most 
important ones. Besides, because in the section of 
quantifying trust, we just give the method for the 
e-commerce platforms that have the trust evaluation 


system, the application of the proposed method is 
limited. 

In future research, we will explore some other fac- 
tors that affect decision-making to different users in 
different cases to improve the accuracy of recom- 
mendation. In addition, the quantification methods 
of user attributes and hotel evaluation criteria also 
need to be further improved. Finally, we will apply 
this method in practice in various fields such as movie 
recommendation. 
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